Identification of shared gene signatures in major depressive disorder and triple-negative breast cancer

Background Patients with major depressive disorder (MDD) have an increased risk of breast cancer (BC), implying that these two diseases share similar pathological mechanisms. This study aimed to identify the key pathogenic genes that lead to the occurrence of both triple-negative breast cancer (TNBC) and MDD. Methods Public datasets GSE65194 and GSE98793 were analyzed to identify differentially expressed genes (DEGs) shared by both datasets. A protein-protein interaction (PPI) network was constructed using STRING and Cytoscape to identify key PPI genes using cytoHubba. Hub DEGs were obtained from the intersection of hub genes from a PPI network with genes in the disease associated modules of the Weighed Gene Co-expression Network Analysis (WGCNA). Independent datasets (TCGA and GSE76826) and RT-qPCR validated hub gene expression. Results A total of 113 overlapping DEGs were identified between TNBC and MDD. The PPI network was constructed, and 35 hub DEGs were identified. Through WGCNA, the blue, brown, and turquoise modules were recognized as highly correlated with TNBC, while the brown, turquoise, and yellow modules were similarly correlated with MDD. Notably, G3BP1, MAF, NCEH1, and TMEM45A emerged as hub DEGs as they appeared both in modules and PPI hub DEGs. Within the GSE65194 and GSE98793 datasets, G3BP1 and MAF exhibited a significant downregulation in TNBC and MDD groups compared to the control, whereas NCEH1 and TMEM45A demonstrated a significant upregulation. These findings were further substantiated by TCGA and GSE76826, as well as through RT-qPCR validation. Conclusions This study identified G3BP1, MAF, NCEH1 and TMEM45A as key pathological genes in both TNBC and MDD.


Introduction
Breast cancer (BC) is one of the most common and fatal malignancies worldwide [1].Considering the long latency and young age of onset, it is important to identify individuals susceptible to BC. Various genetic and environmental factors have been identified to lead to elevated risk of BC.Emerging evidence suggests that patients with major depressive disorder (MDD) have an increased risk of BC [2], and that genetic predisposition of MDD was causally associated with BC risk [3].This implies that certain genetic and molecular pathogenic factors of MDD may also contribute to the development of BC.In 2011, the St. Gallen Expert Consensus divided BC into four subtypes [e.g., luminal A (expressing the oestrogen receptor (ER+), luminal B (ER+), HER2+ (without ER expression (ER-), and triple-negative breast cancer (TNBC, ER-)] [1,4], of which TNBC was the most aggressive subtype and had the worst prognosis [5].Recent research has shown that MDD is a risk factor for ER-negative breast cancer [3].Thus, we hypothesized that MDD and TNBC share common pathological mechanisms.
MDD is caused by both genetic and environmental factors, and its etiology involves multiple organs, including the endocrine, nervous, and immune systems [6][7][8].Although emerging genetic and epidemiological evidence suggests that MDD and BC may have similar etiological mechanisms, the specific pathogenic pathways and molecules that they share have not yet been clearly elucidated.In this study, we identified differentially expressed genes (DEGs) that are commonly associated with both MDD and TNBC by analyzing high-throughput sequencing data from public databases, selected hub DEGs, and further validated their biological importance in functional assays.
Blood samples from five patients with MDD, five matched healthy individuals, tumor tissue samples, and normal adjacent tissue samples from five patients with BC were collected during the time period between January 2022 and March 2023.

Protein-protein interaction (PPI) network construction and identification of hub genes
An interaction network of the common DEGs was constructed using STRING (v.11.0, http://string-db.org/),with the minimum required interaction score set to 0.4, and the results were visualized using the online tool Cytoscape (v.3.9.0, http://www.cytoscape.org/)[16].The CytoHubba plug-in of Cytoscape [17] was employed to identify hub genes, and four topological analysis methods were applied: Maximal Clique Centrality (MCC), Maximum Neighborhood Component (MNC), degree, and Edge Percolated Component (EPC).The top 50 genes resulting from each method were intersected, and the genes present in all four sets were chosen as hub genes.

Weighted gene co-expression network analysis (WGCNA) and selection of hub DEGs
The R package WGCNA (v.1.72-1,https://cran.r-project.org/web/packages/WGCNA/index.html)[18] was applied to perform weighted gene co-expression network analysis (WGCNA).We selected a height cut corresponding to a correlation of 0.99, and the minimum modulus was set to 100.WGCNA modules were significantly positively correlated with both TNBC and MDD (|Pearson correlation coefficient (PCC)| > 0.3), and genes in these modules intersected with previously selected hub genes.Genes present in both sets were identified as hub DEGs for the two diseases.Based on the expression levels of the hub DEGs, principal component analysis (PCA) was conducted to assess whether their expression was disease-specific.The expression levels of hub DEGs were extracted from all four datasets, and their expression levels in disease samples were compared with those in control samples in each of the four datasets.

Validation of hub DEGs in independent external datasets and using real-time quantitative reverse transcriptionpolymerase chain reaction (RT-qPCR)
To enhance the robustness of our findings, we validated the expression levels of hub DEGs using the TCGA and GSE76826 datasets, complemented by RT-qPCR analysis.Expression data for breast cancer were retrieved from the TCGA database, encompassing 158 TNBC samples and 113 normal control samples, using the Illumina HiSeq 2000 RNA Sequencing platform employed for detection.The GSE76826 dataset was generated using the GPL17077 Agilent-039494 SurePrint G3 Human GE v2 8 × 60 K Microarray 039381 (probe name version), including samples annotated as "Female, " comprising 11 MDD samples and 7 controls.
The "TransZol Up" reagent was used to isolate total RNA from blood or tissue samples (TransGen Biotech Inc., Beijing, China, ET111-01).The SYBR Green RT-PCR assays were performed following the manufacturer's instructions for "First-Strand cDNA Synthesis SuperMix for qPCR" (TransGen Biotech Inc, AU341-02) and "Per-fectStart® Green qPCR SuperMix" (TransGen Biotech Inc, AQ601-02).Reactions were run on a LongGene Q2000B system (LongGene Inc., Hangzhou, Zhejiang, China).Three independent experiments were conducted for statistical significance, and all assays were performed in triplicate.The 2 −ΔΔCt method was applied for the relative quantification of gene expression levels.The primer sequences are listed in Table 1.

PPI network construction and identification of hub genes
A total of 266 pairs of interactions were potentially related to the 113 DEGs, according to the STRING database (Fig. 3A).The intersection of results from the four different topological methods generated 35 hub genes for this network (Fig. 3B).

WGCNA and selection of hub DEGs
For both TNBC and MDD training sets, 6 was chosen as the optimal soft threshold power to build the scale-free weighted co-expression network, with a mean connectivity of 1 (Fig. 4A and C), qualifying it as a small-world

Gene
Forward primer sequence (5'-3') Reverse primer sequence (5'-3') ggccacaaagtatttcctgaag gggtgttcacattttgctgata TMEM45A accaatgactcagaagggaaaa ttttggaaccaagatagcaggt network.After merging the modules with highly identical gene expression, ten and five modules were obtained in the co-expression networks for the TNBC and MDD training sets, respectively (Fig. 4B and D).The moduletrait relationships between diseases and modules were further analyzed (Fig. 4B and D).Next, we extracted genes from the modules that were significantly positively correlated with TNBC (e.g., TNBC-blue, TNBC-brown, and TNBC-turquoise) and MDD (e.g., MDD-brown, MDD-turquoise, and MDD-yellow) and identified 673 genes that were correlated with both diseases (Fig. 4E).These 673 genes intersected with the previously identified 35 hub genes, and four genes that were present in both sets, namely G3BP1, MAF, NCEH1 and TMEM45A, were chosen as hub DEGs.The principal component analysis (PCA) diagram separated the samples into distinct groups according to the expression levels of the hub DEGs (Fig. 4F).

Validation of the expression levels of the hub DEGs
In all datasets, the expression of G3BP1 and MAF was significantly downregulated in the TNBC and MDD groups compared to that in the control group, and the expression of NCEH1 and TMEM45A was significantly upregulated in the TNBC and MDD groups compared to that in the control group (Fig. 5A).In addition, the same trend was observed in the blood of patients with MDD and in tissue samples from patients with BC using RT-qPCR (Fig. 5B and C).The detailed baseline characteristics of the patients and healthy subjects from whom blood and tissue samples were collected are listed in Tables 2 and 3.

Discussion
In this study, we identified 20 genes that were upregulated in both TNBC and MDD and 93 genes that were downregulated in both diseases.From these 113 genes, PPI network construction and WGCNA led to the identification of four hub DEGs that may play the most critical roles in the pathogenesis of both diseases, namely G3BP1, MAF, NCEH1 and TMEM45A.Subsequently, their expression levels in public datasets as well as in blood and breast tissue samples were validated.Hsa-miR-34c and hsa-miR-16 were predicted to regulate the expression of hub DEGs using the HMDD database.GO analyses uncovered that the common DEGs of both diseases were functionally related to biological pathways such as "apoptotic process, " "negative regulation of blood vessel endothelial cell migration, " and "xenobiotic metabolic process." Moreover, they were enriched in molecular activities including "zinc ion binding, " "gluconokinase activity, " and "metalloendopeptidase activity." Genetic variants of xenobiotic metabolism genes have been associated with the risk of breast cancer [21].Studies on neurodegenerative disorders, including depression, have provided evidence that xenobiotic-metabolizing enzymes have important functions in brain physiology [22].NCEH1, a common DEG in GO-BP, encodes a multifunctional enzyme that hydrolyzes the amide and ester bonds of many xenobiotic chemicals [23].In addition, gluconokinases such as IDNK have been reported to promote cancer cell proliferation and inhibit apoptosis [24].Furthermore, a number of studies on MDD have implicated glucose metabolic dysfunction in the pathophysiology of MDD, although the underlying molecular mechanisms remain elusive [25].Common DEGs in this GO-MF term included IDNK.Moreover, certain matrix metalloproteinases belonging to the metalloendopeptidase family, such as MMP-9, have been found to contribute significantly to the pathophysiology of depression [26] and have also been identified as therapeutic targets for metastatic breast cancer [27].In the present study, MMP-9 was found to exhibit "metalloendopeptidase activity." In addition, KEGG analysis revealed that the common DEGs were enriched in 28 pathways including "transcriptional misregulation in cancer, " "metabolic pathways, " "endometrial cancer, " "prolactin signaling pathway, " and "MAPK signaling pathway." Prolactin was observed to promote bone metastasis in breast cancer patients, possibly by stimulating lytic osteoclast formation [28], and data from an animal model of MDD also supported the pathological role of prolactin in MDD [29].MAPK signaling is one of the aberrantly activated oncogenic pathways in breast cancer [30].This pathway was also implicated in the activation of the pro-inflammatory transcription factor NF-kappaB, potentially contributing to the pathogenesis of MDD in which inflammation is a key pathological element [31].In summary, the results from the functional enrichment analyses of our study are in line with previous findings and suggest that the above-mentioned pathways and biological activities may play important roles in the pathology of both TNBC and MDD.
From further PPI network construction and WGCNA we obtained four hub DEGs (e.g., G3BP1, MAF, NCEH1 and TMEM45A), and using the HMDD database we identified that hsa-miR-34c and hsa-miR-16 possibly regulate the hub DEGs.Of the four hub DEGs, NCEH1 (which encodes a versatile enzyme involved in diverse metabolic processes) attracted our attention.In addition to its role in xenobiotic metabolism, this enzyme is critical for providing cholesterol for the synthesis of bile acids (BAs) because of its ability to hydrolyze cholesterol esters [32].Abnormal serum levels or altered composition of BAs have been implicated in the pathogenesis of both BC and MDD [33,34].Our results showed that NCEH1 expression was significantly elevated in both TNBC and MDD, indicating that it might play a pathogenic role in these two diseases.It is important to know how its expression levels affect BA synthesis and secretion and whether altered BA levels result in the development of TNBC and MDD.It is also possible that the major pathogenic effects of NCEH1 overexpression are mediated by other molecules and pathways, which warrants further exploration.Furthermore, hsa-miR-34c and hsa-miR-16 were predicted to regulate the expression NCEH1 in the HMDD database, and ample evidence has demonstrated the important roles of these two miRNAs in TNBC and MDD.Low plasma levels of hsa-miR-34c have been reported to be associated with poor prognosis in TNBC [35], and hsa-miR-34c has also been found to suppress TNBC invasiveness and epithelial-mesenchymal transition [36].Hsa-miR-16 has been shown to inhibit the proliferation, invasion, and migration of TNBC cells [37,38] and has diagnostic value for TNBC [39].Additionally, significant negative correlations have been identified between hsa-miR-34c levels and MDD symptoms [40], and both blood and cerebrospinal fluid levels of hsa-miR-16 been found to be significantly downregulated in patients [41][42][43].Neither of these miRNAs has been experimentally validated to regulate NCEH1 so far.Thus, future assays are needed to verify their regulatory relationships in vivo.
Our study had some limitations.First, the clinical information available in public databases is limited and not all datasets are of substantial size, which could introduce bias into our findings.Secondly, the sample size used in our experiments was relatively small, necessitating further validation through larger-scale studies.Lastly, there is insufficient evidence to conclusively designate G3BP1, MAF, NCEH1, and TMEM45A as potential therapeutic targets for TNBC and MDD.This hypothesis requires future clinical trials for verification.
In conclusion, G3BP1, MAF, NCEH1 and TMEM45A may be regulated by hsa-miR-34c and hsa-miR-16 may play critical roles in the pathogenesis of TNBC and MDD.Xenobiotic metabolism, gluconokinase, matrix metalloproteinase, prolactin and MAPK signaling pathways, and bile secretion may underlie the development of these two diseases.These findings provide novel insights for future research on biomarkers and therapeutic targets of both diseases.

Fig. 1
Fig. 1 Flow diagram of the study

Fig. 2
Fig. 2 Identification of common differentially expressed genes (DEGs) and hub genes.(A)Volcano plots showing DEGs.Red and blue dots indicate significantly up-regulated and down-regulated genes in the disease group compared to the control.(B) Venn diagrams displaying the common DEGs that were down-regulated in both diseases (left) and up-regulated in both diseases (right).(C) Enrichment analyses of the common DEGs.All the identified Gene Ontology terms and the top 15 enriched Kyoto Encyclopedia of Genes and Genomes pathways were shown

Fig. 3
Fig. 3 Protein-protein interaction (PPI) network construction and selection of hub genes.(A)PPI network of the common DEGs.(B) A Venn diagram showing the hub genes derived from four different methods

Fig. 4
Fig. 4 Weighted gene co-expression network analysis (WGCNA) and selection of hub DEGs.(A)Selection of the optimal soft threshold power in GSE65194.(B) A clustering dendrogram (left) and a heat map (right) showing the correlations between WGCNA modules and the disease in GSE65194.(C) Selection of the optimal soft threshold power in GSE98793.(D) A clustering dendrogram (left) and a heat map (right) showing the correlations between WGCNA modules and the disease in GSE98793.(E) A Venn diagram showing the intersection of DEGs from disease-associated WGCNA modules.(F) A principal component analysis diagram illustrating the separation of samples into distinct groups based on the expression levels of hub DEGs.

Fig. 5
Fig. 5 Validation and annotation of hub DEGs and their potential interactions with miRNAs.(A)Expression levels of hub DEGs in the training and validation sets.(B) The expression levels of hub DEGs in the tissue samples measured by qRT-PCR.(C) The expression levels of hub DEGs in the blood samples measured by quantitative real-time reverse transcription-polymerase chain reaction (qRT-PCR).* indicates p < 0.05, ** represents p < 0.01, and *** indicates p < 0.001

Fig. 6 "
Fig. 6 "microRNA-hub DEG" interactions and functional annotation of hub DEGs.A Sankey diagram showing the regulatory relationships between disease-associated miRNAs and hub DEGs, and the annotated Kyoto Encyclopedia of Genes and Genomes pathways of the hub DEGs.

Table 2
Baseline characteristics of the breast cancer patients

Table 3
Baseline characteristics of the major depressive disorder patients and the matched healthy individuals BMI: body weight index; HAMD: Hamilton Depression Rating Scale